Modeling the Performance of Geometric Multigrid on Many-core Computer Architectures

نویسنده

  • PIETER GHYSELS
چکیده

The basic building blocks of the classic geometric multigrid algorithm are all essentially stencil computations and have a low ratio of executed floating point operations per byte fetched from memory. On modern computer architectures, such computational kernels are typically bounded by memory traffic and achieve only a small percentage of the theoretical peak floating point performance of the underlying hardware. We suggest the use of state-of-the-art (stencil) compiler techniques to improve the flop per byte ratio, also called the arithmetic intensity, of the steps in the algorithm. Our focus will be on the smoother which is a repeated stencil application. With a tiling approach based on the polyhedral loop optimization framework, data reuse in the smoother can be improved, leading to a higher effective arithmetic intensity. For an academic problem, we present a performance model for the multigrid V -cycle solver based on the tiled smoother. For increasing number of smoothing steps, there is a trade-off between the improved efficiency due to better data reuse and the additional flops required for extra smoothing steps. Our performance model predicts time to solution by linking convergence rate to arithmetic intensity via the roofline model. We show results for 2D and 3D simulations on Intel Sandy Bridge and Intel Xeon Phi architectures. The actual performance is compared with the theoretical predictions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ultra-Low-Energy DSP Processor Design for Many-Core Parallel Applications

Background and Objectives: Digital signal processors are widely used in energy constrained applications in which battery lifetime is a critical concern. Accordingly, designing ultra-low-energy processors is a major concern. In this work and in the first step, we propose a sub-threshold DSP processor. Methods: As our baseline architecture, we use a modified version of an existing ultra-low-power...

متن کامل

Analysis of a flat Highly Parallel Geometric Multigrid Algorithm for Hierarchical Hybrid Grids

While multicore architectures are becoming usual on desktop machines, supercomputers are approaching million cores. The amount of memory and compute power on current clusters enable us e.g. to obtain a resolution of in excess (10 000)=10 degrees of freedom. However, on the downside we are forced to partition our domain into extremely many sub-problems. Portions of the algorithm that do not perm...

متن کامل

Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs

Fast, robust and efficient multigrid solvers are a key numerical tool in the solution of partial differential equations discretised with finite elements. The vast majority of practical simulation scenarios requires that the underlying grid is unstructured, and that high-order discretisations are used. On the other hand, hardware is quickly evolving towards parallelism and heterogeneity, even wi...

متن کامل

Implementation and Optimization of miniGMG — a Compact Geometric Multigrid Benchmark

Multigrid methods are widely used to accelerate the convergence of iterative solvers for linear systems used in a number of different application areas. In this report, we describe miniGMG, our compact geometric multigrid benchmark designed to proxy the multigrid solves found in AMR applications. We explore optimization techniques for geometric multigrid on existing and emerging multicore syste...

متن کامل

Function-based Algebraic Multigrid method for the 3D Poisson problem on structured meshes

Multilevel methods, such as Geometric and Algebraic Multigrid, Algebraic Multilevel Iteration, Domain Decomposition-type methods have been shown to be the methods of choice for solving linear systems of equations, arising in many areas of Scientific Computing. The methods, in particular the multigrid methods, have been efficiently implemented in serial and parallel and are available via many sc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013